Comparing SARS-CoV-2 antigen-detection rapid diagnostic tests for COVID-19 self-testing/self-sampling with molecular and professional-use tests: a systematic review and meta-analysis

Self-testing is an effective tool to bridge the testing gap for several infectious diseases; however, its performance in detecting SARS-CoV-2 using antigen-detection rapid diagnostic tests (Ag-RDTs) has not been systematically reviewed. This study aimed to inform WHO guidelines by evaluating the accuracy of COVID-19 self-testing and self-sampling coupled with professional Ag-RDT conduct and interpretation. Articles on this topic were searched until November 7th, 2022. Concordance between self-testing/self-sampling and fully professional-use Ag-RDTs was assessed using Cohen’s kappa. Bivariate meta-analysis yielded pooled performance estimates. Quality and certainty of evidence were evaluated using QUADAS-2 and GRADE tools. Among 43 studies included, twelve reported on self-testing, and 31 assessed self-sampling only. Around 49.6% showed low risk of bias. Overall concordance with professional-use Ag-RDTs was high (kappa 0.91 [95% confidence interval (CI) 0.88–0.94]). Comparing self-testing/self-sampling to molecular testing, the pooled sensitivity and specificity were 70.5% (95% CI 64.3–76.0) and 99.4% (95% CI 99.1–99.6), respectively. Higher sensitivity (i.e., 93.6% [95% CI 90.4–96.8] for Ct < 25) was estimated in subgroups with higher viral loads using Ct values as a proxy. Despite high heterogeneity among studies, COVID-19 self-testing/self-sampling exhibits high concordance with professional-use Ag-RDTs. This suggests that self-testing/self-sampling can be offered as part of COVID-19 testing strategies. Trial registration: PROSPERO: CRD42021250706.


Search strategy
We searched the databases MEDLINE (via PubMed), Web of Science, medRxiv, and bioRxiv (via Europe PMC), using search terms developed with an experienced medical librarian (MGr) using combinations of subject headings (when applicable) and text words for the concepts of the search question.The main search terms were "Severe Acute Respiratory Syndrome Coronavirus 2, " "COVID-19, " "Betacoronavirus, " "Coronavirus, " and "Point of Care Testing" and checked against an expert-assembled list of relevant papers.The full list of search terms is available in the supplementary material (Supplement Text 2Search Strategy).Furthermore, we looked for relevant studies on the FIND website (https:// www.finddx.org/ sarsc ov2-eval-antig en/).We conducted the search without applying any language, age, or geographic restrictions from inception up until November 7th, 2022.

Eligibility criteria
We included studies evaluating the accuracy of self-testing and/or self-sampling using commercially available Ag-RDTs to establish a diagnosis of SARS-CoV-2 infection against RT-PCR as the reference standard.In studies www.nature.com/scientificreports/assessing self-sampling, the Ag-RDT performance (including readout and interpretation) was conducted by a professional.Sampling conducted or assisted by caregivers was included as self-sampling.RT-PCR samples were eligible if they were either self-collected or professionally-collected without a restriction on sample type (henceforth referred to as 'RT-PCR').
We included all studies reporting on any population, irrespective of age, symptom presence, or study location.We considered cohort studies, nested cohort studies, case-control, cross-sectional studies, and randomized controlled trials (RCTs).We included both peer-reviewed publications and preprints.We excluded studies in which persons underwent testing for the purposes of monitoring or ending quarantine.In addition, publications with a sample size under ten were excluded to minimize bias in clinical performance estimates.

Assessment of methodological quality
The quality of clinical accuracy studies was assessed by applying the quality assessment of studies of diagnostic accuracy (QUADAS-2) tool, which was adjusted to the needs of this review 16 .Details can be found in the supplementary material (Supplement Text 3 QUADAS).

Assessment of certainty of evidence (CoE)
We defined three individual outcomes for this review: (1) concordance between self-testing/self-sampling coupled with professional Ag-RDT conduct and interpretation and fully professional-use Ag-RDTs, calculating Cohen's kappa as well as positive percentage agreement (PPA), negative percentage agreement (NPA), and overall percentage agreement (OPA), (2) sensitivity, and (3) specificity against RT-PCR performed on a self-collected or professionally-collected sample as reference.
Certainty of evidence (CoE) was assessed following the GRADE guidelines for each individual outcome 17 .After rating the respective study type (e.g., RCT or observational trial), each outcome was independently evaluated according to five categories: study design, risk of bias (RoB), inconsistency, indirectness, and imprecision.

Assessment of independence from manufacturers
We examined whether a study received financial support from a test manufacturer (including free provision of Ag-RDTs), whether any study authors were affiliated with the manufacturer, and whether a respective conflict of interest was declared.If at least one of these conditions was met, the study was deemed as not independent from the test manufacturer; otherwise, it was considered as independent.

Statistical analysis and data synthesis
We extracted data from eligible studies using a standardized data extraction form.Wherever possible we recalculated performance estimates based on the extracted data or contacted authors to provide additional information on concordance between self-tested and professionally tested Ag-RDTs.The final data set used is accessible under https:// doi.org/ 10. 11588/ data/ P9JEPG.
We calculated Cohen's kappa as a measure of concordance, its variance, and 95% confidence intervals (CIs) for comparison of results with fully professional-use Ag-RDTs.If four or more studies with at least 20 positive samples were available, we conducted a meta-analysis of Cohen's kappa using the "metafor" package version 3.4-0 in R 18 .PPA, NPA, and OPA were additionally calculated using the following formulas when comparing self-testing/self-sampling with professional-use Ag-RDTs    (a+d) (a+b+c+d) * 100%; We derived the estimates for sensitivity and specificity against RT-PCR and performed meta-analysis using a bivariate model when at least four data sets, each with at least 20 positive samples, were available (meta-analysis was implemented with "reitsma" command from the R package "mada, " version 0.5.11).If less than four studies were available for an outcome, only a descriptive analysis was performed, and accuracy ranges were reported.Univariate random-effects inverse variance meta-analysis was performed (using the "metaprop" and "metagen" commands from the R package "meta, " version 5.5-0) for the pooled sensitivity analysis per Ct values.We predefined subgroups for meta-analysis based on the following characteristics: Ct value range (< 20, < 25, < 30, ≥ 2 0, ≥ 25, ≥ 30), sampling and testing procedure in accordance with manufacturer and/or study team instructions ('IFU-conforming' versus 'not IFU-conforming'), patient age (' < 18 years' vs. ' ≥ 18 years'), presence of symptoms ('symptomatic' versus 'asymptomatic'), and duration of symptoms ('DoS ≤ 7 days' vs. 'DoS > 7 days').
To make the most of the heterogeneous data available, the cutoffs for the Ct value groups were relaxed by up to three points within each range (e.g., Ct value range group < 20 can include studies with Ct values ≤ 17 to ≤ 23).For the same reason, when categorizing by age, the age group < 18 years (children) included samples from persons whose age was reported as < 16 or < 18 years, whereas the age group ≥ 18 years included samples from persons whose age was reported as ≥ 16 years or ≥ 18.Additionally, samples from the anterior nares (AN) and nasal midturbinate (NMT) were summarized as AN.IFU-conformity was judged based on the study team's information.2B; with further details in Supplementary Fig. 1).Potential conflict of interest due to financial support from or employment by the test manufacturer was present in 17 studies (34.7%) 26,28,32,38,39,47,51,55,56,58,59,[61][62][63] .In studies focusing on self-sampling, 30 out of 36 datasets reported IFU-conform conduct of the test, even though sampling was explicitly observed in only 22 datasets (61.1%).For studies evaluating self-testing, 26 datasets stated IFU-conformity, while for the remaining two datasets it was unclear.With a p value of 0.31 and a roughly symmetrical funnel plot, analysis of small study effects-which may indicate publication bias-produced no significant evidence for such effects (Supplement, S2 Figure Funnel Plot).

Study description
Most of the studies included in the review were conducted in high-income countries (HIC): the USA (n = 10), Germany (n = 7), the Netherlands (n = 6), UK, and Canada (n = 2, each), as well as Greece, Denmark, Japan, France, Belgium, Austria, France, Korea, and Hong Kong (n = 1, each).On the contrary, seven studies were conducted in middle-income countries (MIC): India (n = 3), Brazil, Morocco, Malaysia, and China (n = 1, each) 64 .No studies were performed in low-income countries.Considering the study participant's level of education, in two studies reporting on self-testing, the majority of participants (59.6% and 98.1%) had at least a high school degree 11,24 .Out of the 17 studies reporting on self-sampling, one study stated that 52.5% of participants had a higher education degree 35 .Another study included only high school students (78.6%) or teachers (21.4%) 46 , while two other studies included only college students 36,43 .The remaining studies provided no information on the participants' educational backgrounds.Participants had prior medical training (i.e., health care worker) in three self-sampling datasets (2506 samples, 9.1%) 12,35 .Participants were lay people without any medical training for six datasets totaling 5023 samples, but for the other datasets, it remained unclear.Information on the participants' professional backgrounds and prior testing experiences was only reported in one self-testing study 10 .Out of the 144 participants in this study, 12 (8.3%)had prior medical training, 66 (45.8%) had undergone SARS-CoV-2 testing in the past, and four (2.8%) had performed at-home COVID-19 testing.
Most of the self-sampling data (32 datasets; 88.9%) were collected at testing or clinical sites, while for others no information was available.The sampling process was observed in 17 of the self-sampling studies (22 datasets), totaling 19,280 samples (60.6%) 12,13,37-41,43,46,48,49,51,52,54,58,61 , whereas sampling was not observed in four studies (4  datasets; 10.8%) 35,36,47,59 .For the remaining ten studies (10 datasets; 27.0%), it was unclear whether the sampling was observed or not 42,44,45,50,53,[55][56][57]60,62 . Overal, 78.6% of the self-testing studies were carried out at a testing site, and the testing procedure was observed (without providing instructions) by the study team in three studies (1083 samples; 2.9%) 11,28,32 .A total of 27,506 samples were evaluated in the self-testing studies.With 13,166 individuals presenting with symptoms suggestive of a SARS-CoV-2 infection, while 10,103 persons did not show any symptoms at the time of testing.For the rest, the authors did not specify the participants' symptom status.A total of 31,069 individuals participated in the self-sampling studies, of whom 6325 had symptoms, 20,569 were asymptomatic, and 4175 had unclear symptom status.
The most used Ag-RDTs across all studies were the BinaxNow nasal test by Abbott (USA, henceforth called BinaxNow) and the Standard Q nasal test by SD Biosensor (South Korea; distributed in Europe by Roche, Germany; henceforth called Standard Q nasal), with six datasets each.The BD Veritor lateral flow test for Rapid Detection of SARS-CoV-2 (Becton, Dickinson and Company, MD, US; henceforth called BD Veritor), the CLIN-ITEST Rapid COVID-19 Antigen Test (Siemens Healthineers, Germany; henceforth called CLINITEST), and the Rapid SARS-CoV-2 Antigen Test (MP Biomedicals, CA, US; henceforth called MP Bio) were used in three datasets each.
Two self-testing and one self-sampling studies provided additional instructional videos 24,29,45 .Regarding self-testing studies, four studies provided study-specific test instructions since no manufacturer instructions for self-testing were available at the time 11,24,25,29 .
Table 2 provides further information on each of the studies included in the review.

Concordance with professional-use Ag-RDTs
The concordance between self-testing and professional testing was only reported in one study, which found high concordance with a kappa of 0.92 11 .The concordance between self-sampling and professional testing was reported in six studies and ranged from 0.86 to 0.93 13,35,39,49,52 .We performed an exploratory analysis of concordance combining datasets from self-sampling and self-testing studies, assuming that sampling is a major driver of differences between self-testing and professional testing.we observed the pooled Cohen's kappa of 0.91 (95% CI 0.88-0.94)(Fig. 3, Supplementary Table 3).
In the one study in which participants were observed as they self-tested, the majority of deviation from instructions happened during the sampling procedure, with 41.8% of participants failing to rub the swab against the nasal walls 11 .Another common mistake made during sampling involved too little rotation time in the nose   11 .Squeezing the tube while the swab was still inside and squeezing the tube when the swab was being removed were the steps with most frequent deviations during the testing procedure, at 34.9% and 33.1%, respectively.These deviations, however, did not appear to impact test performance in this study, as performance against RT-PCR (Sensitivity 82.5%) was acceptable and concordance with professional testing was high (kappa 0.91).

Presence of symptoms
The summary estimates of sensitivity across all studies were lower in the asymptomatic group compared to the symptomatic group, with 38.1% (95% CI 23.4-55.3)compared to 77.4% (95% CI 71.1-82.6),respectively (Fig. 4B).Specificity was above 99.0% in both subgroups.Self-testing studies, which are included in the pooled analysis, reported a range of sensitivity from 51.0 30 to 82.5% 11 in symptomatic persons.

Duration of symptoms (DoS)
We were unable to perform a bivariate subgroup meta-analysis for a DoS of more than seven days (DoS > 7) due to an insufficient number of available datasets (n = 1).The reported sensitivity and specificity in this study was 53.8% and 100%, respectively 37 .The pooled estimates of sensitivity and specificity in studies reporting DoS ≤ 7 was 79.4% (95% CI 72.7-84.8)and 99.4% (95% CI 98.9-99.7),respectively.
One self-testing study reported a sensitivity of 85.0% and a specificity of 99.1% when only samples with high viral load (≥ 7.0 log 10 SARS-CoV-2 RNA copies/mL) were analyzed 11 .

Age
Across all the studies included in the review, we had 32 datasets with samples from people aged 18 years and older (' ≥ 18 years'), achieving a pooled sensitivity of 65.5% (95% CI 57.8-72.4)(Fig. 4D).For the ' < 18 years' group, a meta-analysis was not possible, as only three datasets were available for this age group.However, the reported sensitivity in these three datasets had a comparable range to that in the ' ≥ 18 years' group (71.4 48to 92.3% 47 ).The pooled specificity was 99.6% (95% CI 99.2-99.8) in the ' ≥ 18 years' group and was above 99.6% in all datasets in the ' < 18 years' group.

Certainty of evidence (CoE)
We found CoE to be high for specificity and sensitivity, and low for concordance and user errors.As for 'imprecision' , we downgraded the CoE for concordance by one point due to the low number of studies and small sample size.For studies assessing concordance and user errors, 'inconsistency' was rated 'serious' and consequently also downgraded by one point, since there was only one study available (Table 3).

Discussion
Our systematic review and meta-analysis found that concordance between self-testing/self-sampling and professional testing using Ag-RDTs is very high with a pooled Cohen's kappa of 0.91 (95% CI 0.88-0.94).Compared to RT-PCR, sensitivity of self-testing/self-sampling across all studies included in our review compared to RT-PCR (70.5% [95% CI 64.3-76.0])was estimated to be almost the same as that of Ag-RDTs when performed by professionals (72.0% 8 ).The summary point estimate of sensitivity for self-testing studies (66.1% [95% CI 53.5-76.7])was also comparable to that of professional-conducted Ag-RDT with overlapping CIs.Pooled sensitivity across self-testing and self-sampling studies increased to 77.4% (95% CI 71.1-82.6) in symptomatic persons, which is in line with the results of earlier reports that showed that presence of symptoms was a key variable affecting sensitivity of Ag-RDT and correlated with viral load 8,65 .Thus, neither overall nor symptomatic pooled sensitivity achieved WHO sensitivity targets of ≥ 80% 10 .Notably, a recent meta-analysis found a pooled sensitivity of 91.1% for Ag-RDTs with self-collected nasal samples 66 .
The results of subgroup analysis based on Ct values are consistent with those of earlier studies, suggesting that viral load is the main determinant of test sensitivity, irrespective of the sampling procedure or the person administering the test 8 .Because Ag-RDTs detect the vast majority of SARS-CoV-2-infected persons with high viral load, self-testing becomes a valuable public health tool for identifying individuals who might be at risk of spreading the virus, especially when RT-PCR testing is not accessible.This approach aids in creating safer environments for reopening schools, workplaces, and organizing large gatherings amid the pandemic.
In addition, it is worth noting that in most cases (60.0% of datasets), the sampling process was unsupervised, which implies the general applicability of our findings to unobserved home-testing.Moreover, even though deviations from the IFU did occur in some cases, this did not appear to have an impact on test performance 11 .
Although limited, the data on deviations from sampling and testing procedures demonstrated that most instruction deviations occurred during sampling, supporting our approach to conduct a pooled exploratory analysis of self-sampling and self-testing.This was additionally bolstered by a positive self-judgement of test execution and interpretation, showing confidence of lay-users to perform Ag-RDTs reliably 24 .Moreover, one study reported that healthcare professionals and laypersons had a high level of readout agreement when clear instructions with illustrations were available 11 .It is, however, crucial to note that the observed sampling deviations are more likely to affect test sensitivity than specificity, because poor sampling is likely to result in decreased sample quality, and thus lower viral load, leading to false negative results.Nevertheless, the results of the sensitivity analysis showed that the pooled sensitivity estimate for self-testing studies is still lower than that for selfsampling studies, which suggests that self-sampling is not the only variable influencing the differences between self-testing and professional testing.To fully understand all the variables and how they affect test performance, more research is necessary.
Our study has several strengths.We thoroughly assessed the included studies with the QUADAS-2 tool using an a-priori developed interpretation guide.In addition, our review was supported by an independent methodologist and followed rigorous methods, aligning with other WHO-commissioned reviews for self-testing.Furthermore, we report on both peer-reviewed articles and preprints from a period that nearly covers the whole Table 3. GRADE table: Should COVID-19 self-testing, defined as self-sampling, processing of the sample and self-readout using Ag-RDTs, be offered as an additional approach to professionally administered testing services?The following table summarizes the certainty of evidence according to the GRADE approach.Explanation: a We used QUADAS-2 to assess risk of bias.The studies enrolled patients consecutively and assessed the self-testing, defined as self-sampling and self-performing the Ag-RDT, results blinded to the reference standard result (rRT-PCR or prof.Ag-RDT testing).While for one study it was not clear whether all self-tests were performed as per manufacturer's instructions, this was ensured in the other.Furthermore, we could not detect any potential bias resulting from the study flow and timing.Therefore, we did not downgrade the quality of evidence for this criterion.b The heterogeneity/inconsistency in findings, as shown by the wideranging point estimates with only marginally overlapping confidence intervals, is likely to originate from differences in the study population.This is strengthened by the fact that the head-to-head comparison between self-testing and professionally testing on the same study population shows similar performance of Ag-RDTs.However, as there are only a few studies available for concordance and one study for user errors, we downgrade for these two outcomes by one.c Following current guidance from the GRADE guideline, we do not downgrade by one point for all studies but acknowledge that the study populations are not fully representative of the populations of interest.Furthermore, the intervention did not differ from the one of interest and outcomes were reported directly, therefore indirectness was judged 'not serious' .d The number of studies and sample size were small, and only one study reported on concordance between self-testing and professionally testing using Ag-RDTs.e For this outcome only qualitative data, or quantitative data in isolated studies in well-described but not comparable settings were available, therefore the criterion 'imprecision' is negligible and rated as 'not serious' .www.nature.com/scientificreports/pandemic.Another strength of this study lies within our subgroup analyses that provide a clearer picture of the accuracy of self-sampling and self-testing across different populations and testing approaches.Our systematic review is, however, limited by the small number of studies that were deemed eligible (particularly those evaluating self-testing) as well as the shortcomings of these studies as revealed by the quality assessment.The degree to which study participants with a relatively high rate of symptomatic individuals with prior training or testing experience are representative of the general population is another drawback.Furthermore, the majority of studies were conducted in HIC; at the same time, populations in MIC, particularly those with a highburden of HIV, were likely to have more experience with self-testing compared to HIC at the beginning of the pandemic 3 .Recent reports find good concordance between COVID-19 self-testing and professionally-conducted Ag-RDTs in a middle-income country 67 .Although there are differences that cannot be accounted for in this meta-analysis, our exploratory analysis found a higher pooled estimate of sensitivity in MIC compared to HIC.

Conclusion
Self-testing and/or self-sampled testing using Ag-RDTs likely achieves similar accuracy as professional-use Ag-RDTs.In the light of the evidence presented in this review and other supporting studies, the WHO recommends COVID-19 self-testing to scale-up testing capacity 68,69 .Further evidence is required to assess the impact of testing strategies including self-testing on the population-level control of SARS-CoV-2 transmission.

Figure 2 .
Figure 2. (A) QUADAS assessment for risk of bias and (B) applicability.

Figure 3 .
Figure 3. Pooled concordance from self-sampling and self-testing versus professional Ag-RDTs (both sampling and testing performed by professional); Abbreviations: a = self-test & professional test positive; b = self-test positive & professional test negative; c = self-test negative & professional test positive; d = self-test & professional test negative; CI = confidence interval.

Table 1 .
Overview of possible sampling combinations in self-testing and self-sampling studies.Percentages might not add up to 100% as they are rounded.AN Anterior nasal, OP Oropharyngeal, NP Nasopharyngeal, Ag-RDT Antigen detection rapid diagnostic test, RT-PCR Reverse transcription polymerase chain reaction; In one study, RT-PCR sample type was unclear.